Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion networks generated from an ASR decoder without human transcriptions to generate a variety of error patterns for model training. We evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded task-oriented conversational dialogues with ASR errors. Experimental results show the effectiveness of our proposed approach, boosting the knowledge-seeking turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster classification is boosted from 0.7924 to 0.9333 in Recall@1. After knowledge document re-ranking, our approach shows significant improvement in all knowledge selection metrics, from 0.7358 to 0.7806 in Recall@1, from 0.8301 to 0.9333 in Recall@5, and from 0.7798 to 0.8460 in MRR@5 on the test set. In the recent DSTC10 evaluation, our approach demonstrates significant improvement in knowledge selection, boosting Recall@1 from 0.495 to 0.7144 compared to the official baseline. Our source code is released in GitHub https://github.com/yctam/dstc10_track2_task2.git.
translated by 谷歌翻译
最近,我们看到了照片真实的人类建模和渲染的神经进展取得的巨大进展。但是,将它们集成到现有的下游应用程序中的现有网络管道中仍然具有挑战性。在本文中,我们提出了一种全面的神经方法,用于从密集的多视频视频中对人类表演进行高质量重建,压缩和渲染。我们的核心直觉是用一系列高效的神经技术桥接传统的动画网格工作流程。我们首先引入一个神经表面重建器,以在几分钟内进行高质量的表面产生。它与多分辨率哈希编码的截短签名距离场(TSDF)的隐式体积渲染相结合。我们进一步提出了一个混合神经跟踪器来生成动画网格,该网格将明确的非刚性跟踪与自我监督框架中的隐式动态变形结合在一起。前者将粗糙的翘曲返回到规范空间中,而后者隐含的一个隐含物进一步预测了使用4D哈希编码的位移,如我们的重建器中。然后,我们使用获得的动画网格讨论渲染方案,从动态纹理到各种带宽设置下的Lumigraph渲染。为了在质量和带宽之间取得复杂的平衡,我们通过首先渲染6个虚拟视图来涵盖表演者,然后进行闭塞感知的神经纹理融合,提出一个分层解决方案。我们证明了我们方法在各种平台上的各种基于网格的应用程序和照片真实的自由观看体验中的功效,即,通过移动AR插入虚拟人类的表演,或通过移动AR插入真实环境,或带有VR头戴式的人才表演。
translated by 谷歌翻译
数十亿人每天都在社交媒体上分享他们的日常生活图像。但是,它们的生物识别信息(例如,指纹)可以很容易地从这些图像中偷走。从社交媒体上泄漏的指纹泄漏的威胁引起了人们对匿名分享图像的强烈渴望,同时保持图像质量,因为指纹充当了终生的个体生物识别密码。为了防止指纹泄漏,通过在图像上添加不可察觉的扰动来作为解决方案出现。但是,现有作品要么在黑盒可传输性方面弱,要么显得不自然。由视觉感知层次结构激励(即,高级感知利用模型共享的语义,这些语义在模型中很好地转移,而低水平的感知提取物则是原始刺激的,并且会引起高视觉敏感性的刺激),我们提出了一个层次的感知噪声,注射框架以解决上述问题。对于黑盒可传递性,我们在指纹方向场上注入保护性噪声,以扰动模型共享的高级语义(即指纹脊)。考虑到视觉自然性,我们通过正规化侧向基因核的响应来抑制低级局部对比度刺激。我们的Fingersafe是第一个在数字(最高94.12%)和现实的场景(Twitter和Facebook,高达68.75%)中提供可行的指纹保护的人。我们的代码可以在https://github.com/nlsde-safety-team/fingersafe上找到。
translated by 谷歌翻译
人群计数已被广泛用于估计安全至关重要的场景中的人数,被证明很容易受到物理世界中对抗性例子的影响(例如,对抗性斑块)。尽管有害,但对抗性例子也很有价值,对于评估和更好地理解模型的鲁棒性也很有价值。但是,现有的对抗人群计算的对抗性示例生成方法在不同的黑盒模型之间缺乏强大的可传递性,这限制了它们对现实世界系统的实用性。本文提出了与模型不变特征正相关的事实,本文提出了感知的对抗贴片(PAP)生成框架,以使用模型共享的感知功能来定制对对抗性的扰动。具体来说,我们将一种自适应人群密度加权方法手工制作,以捕获各种模型中不变的量表感知特征,并利用密度引导的注意力来捕获模型共享的位置感知。证明它们都可以提高我们对抗斑块的攻击性转移性。广泛的实验表明,我们的PAP可以在数字世界和物理世界中实现最先进的进攻性能,并且以大幅度的优于以前的提案(最多+685.7 MAE和+699.5 MSE)。此外,我们从经验上证明,对我们的PAP进行的对抗训练可以使香草模型的性能受益,以减轻人群计数的几个实际挑战,包括跨数据集的概括(高达-376.0 MAE和-376.0 MAE和-354.9 MSE)和对复杂背景的鲁棒性(上升)至-10.3 MAE和-16.4 MSE)。
translated by 谷歌翻译
最近,生成的数据无量子化作为一种​​实用的方法,将神经网络压缩到低位宽度而不访问真实数据。它通过利用其全精密对应物的批量归一化(BN)统计来生成数据来量化网络。然而,我们的研究表明,在实践中,BN统计的合成数据在分布和样品水平时严重均匀化,这导致量化网络的严重劣化。本文提出了一种通用不同的样本生成(DSG)方案,用于生成无数据的训练后量化和量化感知培训,以减轻有害的均质化。在我们的DSG中,我们首先将统计对齐缩写为BN层中的功能,以放宽分配约束。然后,我们加强特定BN层对不同样品的损失影响,并抑制了生成过程中样品之间的相关性,分别从统计和空间角度分别多样化样本。广泛的实验表明,对于大规模的图像分类任务,我们的DSG可以始终如一地优于各种神经结构上的现有数据无数据量化方法,尤其是在超低比特宽度下(例如,在W4A4设置下的22%的增益下)。此外,由我们的DSG引起的数据多样化引起了各种量化方法的一般增益,证明了多样性是无数据量化的高质量合成数据的重要特性。
translated by 谷歌翻译
最近的神经人类表示可以产生高质量的多视图渲染,但需要使用密集的多视图输入和昂贵的培训。因此,它们在很大程度上仅限于静态模型,因为每个帧都是不可行的。我们展示了人类学 - 一种普遍的神经表示 - 用于高保真自由观察动态人类的合成。类似于IBRNET如何通过避免每场景训练来帮助NERF,Humannerf跨多视图输入采用聚合像素对准特征,以及用于解决动态运动的姿势嵌入的非刚性变形场。原始人物员已经可以在稀疏视频输入的稀疏视频输入上产生合理的渲染。为了进一步提高渲染质量,我们使用外观混合模块增强了我们的解决方案,用于组合神经体积渲染和神经纹理混合的益处。各种多视图动态人类数据集的广泛实验证明了我们在挑战运动中合成照片 - 现实自由观点的方法和非常稀疏的相机视图输入中的普遍性和有效性。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译